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Abstract 

This paper aims to present a general idea of method comparison 
of Credit Scoring techniques. Any scorecard can be made in various 
methods based on variable transformations in the logistic regression 
model. To make a comparison and come up with the proof that one 
technique is better than another is a big challenge due to the limited 
availability of data. The same conclusion cannot be guaranteed when 
using other data from another source. The following research challenge 
can therefore be formulated: how should the comparison be managed 
in order to get general results that are not biased by particular data? 
The solution may be in the use of various random data generators. 
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The data generator uses two approaches: transition matrix and scor- 
ings. Here are presented both: results of comparison methods and the 
methodology of these comparison techniques creating. Before build- 
ing a new model the modeler can undertake a comparison exercise 
that aims at identifying the best method in the case of the particular 
data. Here are presented various measures of predictive model like: 
Gini, Delta Gini, VIF and Max p- value, emphasizing the multi-criteria 
problem of a "Good model". The idea that is being suggested is of 
particular use in the model building process where there are defined 
complex criteria trying to cover the important problems of model sta- 
bility over a period of time, in order to avoid a crisis. Some arguments 
for choosing Logit or WOE approach as the best scorecard technique 
are presented. 

Key words: credit scoring, crisis analysis, banking data generator, 
retail portfolio, scorecard building, predictive modeling. 

1 Introduction 

Credit Scoring today is applied in various business areas. It especially has 
an important usage in the banking sector [3J, to optimize credit acceptance 
processes and for the PD models (probability of default) used in Basel II and 
III for RWA (Risk Weighted Assets) calculations []]. 

Their influence on business process has resulted in Credit Scoring becom- 
ing a popular and well-known field, yet it remains an area that still requires 
further development due to the existence of various consultancy companies 
and corporations, who, because it can be very profitable, often formulate so- 
called expert statements or methods without having conducted any extensive 
and fully scientific research. Sometimes this is due to legal constraints that 
do not allow advance research on particular real data coming from banking 
processes to be conducted. 

Yet, the current crisis demands that researches focus on better predictive 
modeling, especially with better stability properties in the case of risk over 
time [5]. 

All the above-mentioned arguments suggests the following base questions: 
• Is it possible to conduct Credit Scoring research without any real data? 
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• Can a method be formulated that will enable comparison of one tech- 
nique with another without particular real data, or in other words, can 
a general data repository for comparisons be created? 

• Can such a general Credit Scoring repository be made available for all 
interested parties and will it contain enough particular cases to become 
GENERAL? 



2 Data used for analysis 



Two kinds of real data coming from quite different areas (banking and 
medicine) are used to present the idea of a random data generator as a 
generalized data for Credit Scoring. 



2.1 Real banking data 

Banking data are taken from one of the Polish banks from the Consumer 
Finance division. There are 50,000 rows and 134 columns. Column names 
are secured. Target variable represents the typical default event delinquency 
of more than 60 past due days since the start of the 6 months observation 
point. 



2.2 Medical real data 

The medical data represents breast cancer survivability in USA [2]. The 
data comes from Surveillance, Epidemiology and End Results repositorjQ. 
There are 1, 343, 646 rows and 40 columns. Target function represents either 
survivability or fatality due to cancer during the 5 years following diagnosis. 
The advantage of this data is that there is a large number of rows available, 
a situation unlike that found in the real banking field. 



2.3 Random data generator 

The Consumer Finance data generator is described by [6] . The general idea is 
based on the Markov process with transition matrix. The matrix is changing 



http://seer.cancer.gov accessed 30 August 2012. 
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over time due to the impact of one macroeconomic variable. It results in 
cyclic risk over time. Every new month of data that is created is based 
on the score for all credit accounts; cases with greater delinquency have 
worse scores. Their shares are connected to particular transition matrix 
coefficients. Even if the scoring formula for the following months is known, 
the normal scoring models built in the conventional manner are based on 
different target functions and can be quite different from the one in the data 
generator. Despite the simple construction of the data generator, it can be 
extended and further developed for various portfolios: with small, medium 
and large risk value (using a different transition matrix), with small, medium 
and large periodical property and different time dependent scoring rules. It 
is a very flexible way of data creation and the provision of comprehensive 
information about the process, because not all the information is secured. 
All variables and the various form of characteristics that are created can 
therefore be interpreted. Dataset contains 2, 694, 377 rows and 56 columns. 



3 Steps to follow in scorecard model building 



For all three kinds of data there are run algorithms of predictive models 
building. All calculations are made by SAS System^] based on units: Base 
SAS, SAS/STAT and S AS/GRAPH. 

• Random samples - data partitioning. Here two datasets are created: 
training and validating taken at different times; validating data being 
taken later. This method - called time sampling - allows to study a 
models stability over time. 

• Attribute creating - binning. Based on Entropy in order to measure 
every continuous variable, which is then categorized into an ordinal 
variable. Some categorical variables are also changed by joining some 
categories based on similar risk measures. These methods are usually 
implemented in tree decision techniques. 

• Variable pre-selection - the dropping of insignificant variables. At this 
stage any information that is based on simple one- dimensional criteria 



SAS Institute Inc. http://www.sas.com accessed 30 August 2012. 
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is excluded as they are considered to be variables with little chance of 
being useful in the next steps. Here predictive powers of single variables 
and their stability over time are examined. Variables with small powers 
or those that are significantly unstable are deleted. 

• Multi-factor variable selection - lists of many models. In the SAS Lo- 
gistic procedure a heuristic selection method for continuous variables 
based on branch and band technique is implemented [3J. It is an ex- 
tremely useful method to produce many models, namely 700 models as 
the best 100 models with 6-variables, 7-, ... and 12-variables. 

• Model assessment. There is not any single and unique good model 
criterion. Instead, a selection is employed, such as: predictive power: 
(AR in other words Gini [U]), stability: AR di ff - delta Gini (relative dif- 
ference between predictive powers on training and validating datasets), 
collinearity measures: MAXyiF - maximal variance inflation factor, 
MAXp earson - maximal Pearson correlation coefficient on pairs of vari- 
ables and MAXconindex - maximal condition index and also significant 
measures: M AXp ro ^cusquare - maximal p-value for variables in the 
model. 

4 Different variable coding and selection 

A scoring model, though based on the same set of variables, can be esti- 
mated in logistic regression on various methods dependent on the coding. 

The first way, called REG, is a model without any variable transforma- 
tion. In this case the missing imputation step, which is certainly not trivial 
and can be quite important, is necessary but the REG method is considered 
here for an additional scale or mirror, so the simplest missing imputation 
method - imputation by the mean - can be employed. 

The second way called LOG is based on logit transformation: for every 
attribute (after binning) its logit is calculated. The transformed variable be- 
comes partially constant and discrete (quasi-continuous). This way is useful, 
because the missing imputation is not required. The missing value can be 
assigned to a separate attribute or combined with other values dependent 
on the binning criteria. Moreover, this method treats qualitative and quan- 
titative variables in the same way; at the end all variables are binned and 
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transformed into a logit structure. This is a similar WOE approach used in 
SAS Credit Scoring Solution [8]. 

The third way - GRP, is connected to the binary coding called reference 
or dummy, see table [2j The reference level is set at the attribute with the 
lowest risk. Any other solutions where the reference level is, for example, 
set at the most representative attribute, with the greatest share or other can 
be considered, though this is a topic for further research. Dummy coding 
produces a large number of binary variables and it is not easy to run the 
heuristic branch and band variable selection method because the time of 
calculation is increasing to infinity. It is a typical case of the familiar NP- 
complete problem. Moreover the company Score Plus [7] rightly suggests to 
run the selection method based on better coding called ordinal or nested, see 
table |3l H] and [5j In the case of the last mentioned coding method, all betas 
in the model with one variable have the same sign, but this experimental fact 
requires formal proof. 

In the cases of REG and LOG one single beta is estimated for every vari- 
able in the model. For the GRP method every beta is estimated separately 
for every attribute, so in that case the number of parameters in the model is 
about 6 times greater (if we assume 7 attributes per variable). Another good 
research topic would be to take the following into consideration: diagnostic 
research of GRP models, their correctness of estimation, minimal sample size 
and powers of statistical tests. Intuition suggests that care should be taken 
here because models can be overestimated. 

In the case of GRP, due to a lack of variable heuristic selection all variable 
combinations resulting from the REG and LOG methods are taken. All these 
combinations taken together are estimated by the GRP method. 

In practice it is often the case that by using the GRP method some 
attributes are not significant, but the whole variable can be significant, espe- 
cially by "TYPE 3" tests. Yet, a single attribute remains insignificant. It is 
not advisable to retain that attribute in the final model. What is needed is 
a new sub-method to eliminate insignificant attributes when using the GRP 
way. Without that step all results of GRP do not provide good models to 
become a serious competitor to LOG. In order to be so a solution for the 
elimination of insignificant attributes called attribute adjustment should be 
devised. Here are chosen two simple algorithms: backward and stepwise, all 
available in SAS Logistic procedure. 

The model can be estimated based on dummy coding or nested. Therefore 
finally 12 attribute adjustments methods are created, see table |HJ 
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Table 1: Example of scorecard model. 



Variable 


Condition (attribute) 


Partial score 




< 20 




10 


Age 


< 35 




20 




< 60 




40 




< 1500 




15 


Income 


< 3500 




26 




< 6000 




49 


Table 2 


Reference 


coding 


I - dummy. 


Group number Variablcl 


Variable2 Variable3 


1 


i 








2 





1 





3 








1 


-1 












All models with exclusion REG are scorecard models, see table [U 

5 Results 



For every kind of data: sample datasets training, validating and variable 
pre-selections are created, see table [7J 

In the next step 700 models for REG and LOG are calculated separately. 
Then 1,400 models are estimated by GRP method. Every GRP model then 
is adjusted by all 12 methods. To summarize about 19, 600 models for every 
kind of data are created and estimated, so in total about 58, 800 models. 
Such a large number of models with their various criteria statistics creates 
the possibility to study distributions of these criteria and to make a thorough 
comparison based on distribution properties. 

Table 3: Cumulative descending coding - nested descending (ordinal). 

Group number Variablcl Variable2 Variablc3 



1 

2 10 

3 110 

4 111 



Source: SAS Institute Inc. 2002-2010. SAS/STAT 9.2: Proc Logistic - User's Guide, Other Parameterizations. 
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Table 4: Cumulative ascending coding - nested ascending. 

Group number Variablel Variablc2 Variable3 

1 1 1 1 

2 Oil 

3 1 

4 



Table 5: Cumulative monotonic coding - nested monotonic. 

Group number Variablel Variablc2 Variablc3 

1 1 1 1 

2 110 

3 10 

4 



Table 6: Attribute adjustments for GRP models 



Method name 


Estimation 


Selection 


Coding 


NBA 


nested 


backward 


ascending nested 


NBD 


nested 


backward 


descending nested 


NBM 


nested 


backward 


monotonic nested 


NSA 


nested 


stepwise 


ascending nested 


NSD 


nested 


stepwise 


descending nested 


NSM 


nested 


stepwise 


monotonic nested 


DBA 


dummy 


backward 


ascending nested 


DBD 


dummy 


backward 


descending nested 


DBM 


dummy 


backward 


monotonic nested 


DSA 


dummy 


stepwise 


ascending nested 


DSD 


dummy 


stepwise 


descending nested 


DSM 


dummy 


stepwise 


monotonic nested 
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Table 7: Sample sizes 



Data source 



Training Validating Number of chosen variables 



Banking 
Medical 
Random 



27 325 12 435 60 
29 893 17 056 23 
66 998 38 199 33 



All calculations are made on a simple Laptop Core Duo 1,67GHz and 
take about 2 months without interruptions to complete. 

6 Interpretation 

15 predictive modeling techniques: REG, LOG, GRP and 12 attribute 
adjustments are calculated and compared. For every technique mentioned 
above and in order to avoid scale problem 700 best models are initially se- 
lected. These are based on ARvaiid, e.g. predictive power (Gini statistic) on 
validating dataset. 

In figures [U [2] and [3] one-dimensional distributions of the few model crite- 
ria: prediction, stability and collinearity are presented. The main differences 
for prediction using AR Va ud can be indicated for models REG, LOG and 
GRP. All GRP adjustments have similar results. The same conclusion is 
true in the case of stability using AR di ff. When using collinearity there are 
significant differences. GRP adjustments strongly improve MAXvif and for 
LOG models almost all values concentrate around an acceptable level. 

A one-dimensional approach is unable to identify the best scoring tech- 
niques in the correct way, because even if one model has the best prediction, 
it can also have the worst stability, so rather ought to be excluded from 
the list of suitable candidates. The better approach is to analyze the multi- 
dimensional criterion, where all model statistics are taken together and where 
the distance from the ideal model is defined. The ideal model is the "crystal 
ball": the highest prediction (100%), null collinearity and null instability. 
It the practice not all criteria have the same weights, but it is not a trivial 
problem to define the proper priorities. In figures 0], [5] and [6] three cases with 
different relations between weights for prediction and stability: equality, mi- 
nority and majority are presented. The lower note means a better model; one 
that is closer to ideal model. This manner of data presentation gives quite 
interesting results. REG models significantly lie outside the ideal model for 
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Figure 1: Onedimensional distributions - prediction. 
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Figure 2: Onedimensional distributions - stability. 
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Figure 3: Onedimensional distributions - collinearity. 
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Figure 4: Multidimensional approach. Stability and prediction with the same 
weights. 
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Figure 5: Multidimensional approach. Stability with greater weight than 
Prediction. 
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Figure 6: Multidimensional approach. Prediction with greater weight than 
stability. 
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every type of data. The GRP has too large a variance and also is not close 
to the ideal model. LOG models have desirable notes, but consistently fail 
to have the lowest note: the minimal distance to the ideal model. Some 
GRP adjustments have the best properties, especially models estimated by 
nested coding. Furthermore, all adjustments with monotonic coding are con- 
centrated around very good levels, almost always the minimal distance from 
the ideal model. 

From amongst all the adjustments methods NBM (nested, backward, 
monotonic nested) is the one that ought to be highlighted as a good method 
based on the results presented and with the added bonus of simple imple- 
mentation and time of calculation. So, in conclusion, only two methods are 
chosen for further analysis: LOG and NBM. 



7 Final comparison: LOG contra NBM 



Based on many 3D analysis, which cannot be presented in this paper, only 
two of the most important criteria to identify significant differences between 
LOG and NBM methods are chosen. Only prediction AR va ud and stability 
ARdiff are required to present the final comparison. In figures [71 [8] and [9] 
scatter plots of these two statistics for three kinds of data are presented. 
Here real data from modeling process without any scaling are presented. It 
can be indicated that the LOG method (represented by stars on the figure) 
provides slightly more stable models than NBM (represented by gray circles) 
and with slightly lower predictive powers than NBM. Because the difference 
is not very marked and almost always can be found in models with similar 
properties when using both methods it is suggested that the simplest method, 
LOG, is used. On the other hand, from these two criteria a more conservative 
approach is to select models with better stability than greater prediction. So, 
finally, after various analyses among 15 scoring techniques the LOG method 
is the simplest and the best method in order to build good models where, 
for example, the modeler does not have enough time. In other cases it is 
suggested to always make a serious analysis of all known and available scoring 
techniques because the best method is a spectrum of methods. 
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8 Conclusion 



In spite of the three different kinds of data: banking, medicine and ran- 
dom all the comparison results of the various scoring techniques seem to be 
in convergence and give the same conclusions. In other words the conclusion 
can be formulated that the research method for scoring technique comparison 
presented in the paper is independent from data and is not biased by par- 
ticular data structures. This is a very profitable statement, prompts further 
research and gives the possibility to focus on one more available data type: 
random data. Moreover, the comparison technique which is presented can 
be always updated for new data. The analyst can always, before building 
a new model, run the technique presented here in order to see the results 
directly coming from his data, even they prefer a method based on their own 
experience. The one disadvantage is the time of calculation. This argument 
suggests starting many analyses on random data to begin with, because they 
are always available and can be published without any special restrictions. 
The random data can, of course, be created in various ways, always be im- 
proved upon or altered in order to get better and more general conclusions. 

It would now seem possible to answer the main question about the pos- 
sibility of research in Credit Scoring without real data. Even if the results 
presented for the three kinds of data have some small differences, the gen- 
eral message is that it is possible to create a General Credit Scoring Data 
Repository based on some random generators. 
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